Project-Team:STARS

Inria | Raweb 2017 | Presentation of the Project-Team STARS | STARS Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

Deep Learning applied on Embedded Systems for people detection

Participants : Juan Diego Gonzales Zuniga, Ujjwal Ujjwal, François Brémond.

keywords: Deep learning, CNN, Embedded Systems

Introduction

One of the problems with people detection is the amount of resources it takes for quality results. Most architectures either require big memory or large computing time to achieve a state-of-the-art position, these results are mostly achieved with dedicated hardware at data centers. The applications for an embedded hardware with these capabilities are limitless: automotive, security and surveillance, augmented reality and healthcare just to name a few. But the state-of-the-art architectures are mostly focused on accuracy than resources consumption [74] [75] [140].

The popularity of deep learning invites us to explore high-performance algorithms. In our work, we have to consider improving the systems' accuracy and reducing resources for a real-time application on people detection. This will lead towards new and efficient deep learning solutions.

State-of-the-art investigations

Deep learning lacks a strong theoretical background and a significant part of the knowledge is by investigating existing systems [75] [92] [140]. In order to better grasp the behavior in a different range of scenarios, we started our investigation to comprehend the nature of deep learning by diving into architectures that multi-task different activities. The combination of detection and segmentation shed light on the mutual improvements for people detection as seen in [67] and [50].

Another key part of our investigation was also to experiment with low time consuming architectures such as [109], an architecture that takes less time than [75] [92] [140] but still competitive and fairly flexible.

Our investigations offered us insights such as the following :

Performance limitations of current systems: We were able to conclude a number of important scenarios where present state-of-the-art systems stutter in their detection performance. This primarily includes the following instances :
- Loss function: This refers to the feedback that will be reinserted for training. Different and more complex loss functions have different results. In other words, it is not the quantity of samples to train but more so the way they are trained.
- Usage of filters: It is commonly used among deep learning architectures to have a small filter size, this improves the field view of an image but also increases the number of parameters to control.
- Time-Computation: Most architectures with high-performance double the work to refine their precedent results, the accuracy of the solution is undeniable but the cost of computation and the memory resources also get affected.
Suboptimal usage of CNN architectures: (see section 7.3).

Outcome

Our investigations have showed us to focus more on the quality of training and not so much on the quantity. This allows us to focus upon relevant portions of system design. We expect this to give us clues on how to increase accuracy without compromising the resources.

Following these state-of-the-art studies we plan to coalesce our findings in a review paper which we aim to submit to a journal shortly.

This work has been done in collaboration with Serge Tissot (Kontron).

Previous |

Home | Next next